Add multi-algorithm deterministic cuDNN convolutions #34951

duncanriach · 2019-12-09T04:27:35Z

This current pull request intends to address a bug in the functionality of the environment variable TF_CUDNN_DETERMINISTIC (see PR 24747) and also the environment variable TF_DETERMINISTIC_OPS (see PR 31465).

The current implementation (before application of this current pull request) of deterministic cuDNN convolution in TensorFlow chooses, for any layer configuration, one fixed deterministic algorithm for each of the forward and two backward propagation paths.

I have since come to appreciate that each algorithm is not guaranteed to work on all layer configurations. The solution represented by this current PR addresses that problem. It uses the existing auto-tuning mechanism to attempt to use all of the available deterministic algorithms, and then, instead of choosing the fastest one (as regular auto-tuning does), this solution chooses the first deterministic algorithm that works. It does this in a deterministic way, based on its index in a intentionally ordered array.

timshen91 · 2019-12-09T21:02:02Z

tensorflow/stream_executor/cuda/cuda_helpers.h

+
+// A helper function to decide whether to enable deterministic cuDNN
+// functionality.
+bool RequireCuDNNDeterminism();


It seems weird for TF and XLA to ask for flags from StreamExecutor, as StreamExecutor is just a library, and it shouldn't impose constraints on users about how to use this library.

Ideally, the source of truth of "users require determinism" should be a flag in TF, and then it get passed down to XLA. Both TF and XLA pass down the same boolean to StreamExecutor. But this ideal solution creates unnecessary amount of work.

How about this:

Create a TF flag TF_DETERMINISTIC_AUTOTUNING in tensorflow/core/kernel/gpu_utils.h, which is only used by TF kernels.

Create a similar XLA flag to DebugOptions in tensorflow/compiler/xla/xla.proto, which is used by XLA.

Keep the current StreamExecutor flags as it is privately in cuda_dnn.cc.

When the end user wants determinism, they have to compose all necessary flags. This way, at least the layering is less coupled.

Hi Tim,

Adding and Modifying Switches

I strongly prefer to not add additional determinism switches and/or change the functionality of the existing switches. TF_CUDNN_DETERMINISTIC makes cuDNN-based ops function deterministically, and TF_DETERMINISTIC_OPS aspires to making all ops, including those based on cuDNN, operate deterministically. Since a model either operates deterministically or not (there being no useful gray area), TF_DETERMINISTIC_OPS supersedes, and ultimately makes fully redundant, TF_CUDNN_DETERMINISTIC.

At the moment TF_DETERMINISTIC_OPS is being listened to, like a global variable, in both Python code and C++ code. I'm currently working on a deterministic mode for another op, which will also be enabled by TF_DETERMINISTIC_OPS. In time, TF_DETERMINISTIC_OPS will be referenced in more and more places in the code-base. I envision the functionality of TF_DETERMINISTIC_OPS eventually being replaced by a fully plumbed-in config variable.

I have been communicating the status and recipes for TensorFlow determinism here, and I don't want to make that information unnecessarily complicated for the user. Note that I intend to document TensorFlow determinism at a high level in the official TensorFlow documentation soon.

Ideal Solution

I'm not sure what you mean by "a flag in TF." Perhaps you're referring to adding something to tf.config? Or are you just referring to what is represented by RequireCuDNNDeterminism() (the logical OR of TF_CUDNN_DETERMINISTIC with TF_DETERMINISTIC_OPS)?

Following your outline above, I'm thinking that a practical way forward might be to move RequireCuDNNDeterminism() from tensorflow/stream_executor/cuda/cuda_helpers.h/.cc to tensorflow/core/kernels/gpu_utils.h/.cc and use it where needed in tensorflow/core/kernels/* including passing it down in the interface to the GetConvolve*Algorithms() methods in the stream executor.

I currently don't know how to pass this flag down to XLA, but I could work on trying to figure it out. Then XLA could also use it and pass it to the GetConvolve*Algorithms() methods in the stream executor.

Compromise

Instead of passing the flag down to XLA, would it be acceptable to you if I replicated the RequireCuDNNDeterminism() code (which looks at both TF_CUDNN_DETERMINISM and TF_DETERMINISTIC_OPS) in tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc? There, the setting can be both used and passed down to the GetConvolve*Algorithms() methods in the stream executor.

I just pushed a commit that removes the dependency of both core/kernels:gpu_utils and xla/service/gpu:gpu_conv_algorithm_picker on stream_executor. This has been done simply by replicating the code that listens for, and caches, TF_CUDNN_DETERMINISTIC || TF_DETERMINISTIC_OPS in both core/kernels/gpu_utils.cc and xla/service/gpu/gpu_conv_algorithm_picker.cc

I considered passing a require_determinism flag through the parameter lists of the CudnnSupport::GetConvolve*Algorithms() methods (defined in stream_executor/cuda/cuda_dnn.cc), rather than generating the flag locally within cuda_dnn.cc, but realized that the flag was also needed for selecting the deterministic operation of cuDNN max-pooling, which is also done in cuda_dnn.cc. There was no easy way to propagate the flag in that case.

So currently, the flag is generated independently, and equivalently, in three different places. To prevent code repetition, I would prefer for it to be in a library that can be included by stream_executor, core, and XLA. I don't know what would be the right library for that, however. I originally incorrectly chose stream_executor/cuda:cuda_helpers because of my lack of understanding of the architecture of TensorFlow.

To re-summarize my intentions:

I want to keep the API for determinism as simple as possible for users, which means not changing the meaning or functionality of TF_CUDNN_DETERMINISTIC and TF_DETERMINISTIC_OPS, and also extending the promise that they represent. This pull request represents a bug fix that requires those environment variables to directly control code in XLA and core/kernels.

For this development work, I think it makes sense to be able to make the changes where necessary to attain the deterministic functionality as quickly as possible while maintaining a simple and consistent API. Going forward, this is mostly going to appear as different parts of the codebase listening directly to TF_DETERMINISTIC_OPS. Ultimately, either myself, or someone else, can go though and properly plumb the switch into all the places that have been highlighted by the use of TF_DETERMINISTIC_OPS, possibly from something like tf.config.deterministic_ops.

Assuming that it would be preferable to define RequireCudnnDeterminism() once, in one place, I've been looking for an appropriate place to do that. I'm wondering if tensorflow/core/common_runtime/gpu would make sense. I'm thinking that it could go into a new "module" (.h + .cc) called gpu_determinism. Would it be okay for stream_executor, core, and XLA to import and use this?

I think I've found another (even more?) appropriate place to define RequireCudnnDeterminism(): tensorflow/core/util:use_cudnn (use_cudnn.h / use_cudnn.cc).

@duncanriach , thanks for looking at these solutions!

I don't have a sense of priority for this PR, so I'm happy to defer the call to you. Depending on the priority, it's fine by me either to hold on this PR (as you seem to suggest) or to query the env var in multiple places, with comments describing the migration path (your original commits).

If you're willing to take it, and it seems that you are, then I would much prefer to have a bug fix in place immediately, by querying the env var in multiple places (as specified by the current commits).

Please will you clarify what you mean by comments describing the migration path? Do you mean adding comments explaining the intention to migrate to tf.config.experimental.deterministic_ops and associated plumbing? If so, I would gladly add that.

Also, are you happy with having the code defined by RequireCudnnDeterminism() (and the migration-plan comment) replicated in three different places in the codebase, or would you prefer, as I would, for it to be defined in one place, such as tensorflow/core/common_runtime/gpu:gpu_determinism? Refactoring that would be an easy and quick change to make.

If you're willing to take it, and it seems that you are, then I would much prefer to have a bug fix in place immediately, by querying the env var in multiple places (as specified by the current commits).

Yes, this is what I meant.

Please will you clarify what you mean by comments describing the migration path? Do you mean adding comments explaining the intention to migrate to tf.config.experimental.deterministic_ops and associated plumbing? If so, I would gladly add that.

Yes, plus verbal warning bits like "this is a temporary solution".

Also, are you happy with having the code defined by RequireCudnnDeterminism() replicated in three different places in the codebase, or would you prefer, as I would, for it to be defined in one place, such as tensorflow/core/common_runtime/gpu:gpu_determinism? Refactoring that would be an easy and quick change to make.

I prefer to duplicate them in several places all with the comment we talked above, also with "this code is duplicated, and should be in sync with [other files]". I don't expect frequent changes to these duplicates, so keeping them in sync shouldn't be too much work.

Got it. Yes, it's very unlikely that the code will ever need to change (until it's removed). I'll work on an incremental commit to address these items. Thank you, Tim.

@timshen91, The most recent commit adds a detailed comment to all replicas of the function.

gbaned · 2019-12-17T11:33:42Z

@duncanriach Can you please check reviewer comments and keep us posted. Thanks!

duncanriach · 2019-12-20T21:28:50Z

@gbaned. I have responded to the reviewer. Please remove stat:awaiting response.

duncanriach · 2019-12-27T21:46:49Z

Changes pushed, and discussed here.

duncanriach · 2019-12-30T21:44:25Z

@gbaned, please will you add the awaiting review tag again?

History:

7 days ago: You added the awaiting review tag.
3 days ago: I pushed some changes with an explanation with the intention of moving this PR forward.
2 days ago: The bot removed the awaiting review tag.
Today: I added another response.

…onfig plus plumbing

…tic-cudnn-convolutions PiperOrigin-RevId: 291684013 Change-Id: I818177de66eeec3dd52e276a5894a1d7a7166459

akuegel · 2020-01-27T11:44:50Z

This change had to be rolled back. It seems one of our test targets became flaky with this CL.

duncanriach · 2020-01-27T21:55:57Z

@akuegel, oh no! Could you tell me which test target became flaky?

(adding link to roll-back commit here)

sanjoy · 2020-01-29T23:31:14Z

@akuegel, oh no! Could you tell me which test target became flaky?

It is an internal target, but AFAICT the target was always flaky. I'm now following up with the team internally.

duncanriach · 2020-01-30T00:29:05Z

Awesome. Thank you, @sanjoy!

akuegel · 2020-01-30T08:34:40Z

I will try to roll forward again.

duncanriach · 2020-01-30T20:45:58Z

Thank you, @akuegel.

…deterministic-cudnn-convolutions PiperOrigin-RevId: 292501090 Change-Id: I31fd5aa4ed36c2929f1300250352781fca749f37

leiwuzheng · 2021-11-20T21:30:19Z

Hi @duncanriach , I think your pull introduced non-deterministic behavior of tensorflow 2.5.0. For example, I launched the tensorflow twice to do inference for my convolution graph. using following environment variables:

setenv CUDA_VISIBLE_DEVICES 0 ; setenv TF_CPP_MAX_VLOG_LEVEL 4 ; setenv TF_DUMP_GRAPH_PREFIX ; setenv TF_DETERMINISTIC_OPS 1 ; setenv TF_USE_DEFAULT_CONV_ALGO 0 ; setenv TF_CUDNN_USE_AUTOTUNE 1

The 1st run cudnn autotune giving the following convolution algorithms:
2021-11-21 04:07:25.562823: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 6
2021-11-21 04:07:25.966770: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 6
2021-11-21 04:07:26.366184: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 6
2021-11-21 04:07:26.407513: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 0
But second run, cudnn autotune gives:
2021-11-21 04:07:48.726334: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 6
2021-11-21 04:07:49.133102: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 6
2021-11-21 04:07:49.533022: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 6
2021-11-21 04:07:49.574372: I tensorflow/core/kernels/conv_ops.cc:1306] Convolution Algorithm: 1
Then, these 2 run will give generate different inference result due to different conv algos are used.

Do you happen to have a work around to avoid this non-deterministic behavior? Turn off autotune by TF_CUDNN_USE_AUTOTUNE, and reply on cudnnGetConvolutionForwardAlgorithm_v7 to select convolution algorithm is a feasible solution? Thanks in advance.

duncanriach · 2021-11-22T23:43:45Z

Hi @Leiwu-Zheng, I believe that the code to deterministically select cuDNN convolution algorithms has evolved significantly since this pull request. @kaixih, please could you comment on this?

kaixih · 2021-11-23T19:52:16Z

Is this for the TF2 code? For the cudnn frontend API, we will use the first and working deterministic engine when TF_CUDNN_DETERMINISTIC=1. So, it should be able to guarantee an in-run and run-to-run determinism. For the legacy cudnn API (as the algos shown in #34951 (comment)), I feel TF_CUDNN_DETERMINISTIC=1 can only ensure the in-run determinism since it will use a list of deterministic algos but not be certain which one is used. (I might miss something, since I am not very clear about the context of the thread.)

duncanriach · 2021-11-24T05:34:29Z

@Leiwu-Zheng, @kaixih and I are working on this together. We'll get back to you, but it will probably be in the new year (2022).

CCing @reedwm as well. Reed, my understanding is that @Leiwu-Zheng is setting TF_DETERMINISTIC_OPS to "1" or "true" in TF2 and getting nondeterministic selection of deterministic algorithms.

Wait, what version of TensorFlow are you using, @Leiwu-Zheng? According to my notes, in version 2.0, you'll need to use TF_CUDNN_DETERMINISTIC, but in version 2.1, and later, you'll need to use TF_DETERMINISTIC_OPS.

Actually, scratch all of that. @Leiwu-Zheng, please will you open an issue with a clear and well-contained reproducer of the problem you're witnessing.

leiwuzheng · 2021-11-24T19:06:20Z

@duncanriach, thanks for the investigation. I have updated my previous comment to make it more clear. I will open an issue. Currently, I am looking for a work around that version 2.5.0 can generate deterministic result for both training and inference, but TF_USE_DEFAULT_CONV_ALGO=1 only works for inference.
@kaixih , I noticed in the version 2.5.0 codes, TF_CUDNN_USE_FRONTEND is turned off by default. It means TF_CUDNN_USE_FRONTEND = 1 could be a workaround for version 2.5.0 to get in-run and run-to-run determinism?

Update:
I just launched tensorflow 2.5.0 10 times to do the inference w/ following settings, it always generate the same result.
setenv CUDA_VISIBLE_DEVICES 0 ; setenv TF_CPP_MAX_VLOG_LEVEL 0 ; setenv TF_DUMP_GRAPH_PREFIX ; setenv TF_DETERMINISTIC_OPS 1 ; setenv TF_USE_DEFAULT_CONV_ALGO 0 ; setenv TF_CUDNN_USE_AUTOTUNE 1 ; setenv TF_CUDNN_USE_FRONTEND 1
@duncanriach , do you still we still need to open a bug issue?

duncanriach · 2021-11-25T03:52:11Z

@Leiwu-Zheng, thank you for looking into this more deeply and for discovering this work-around: TF_CUDNN_USE_FRONTEND = "1".

cuDNN convolution algorithm selection should be deterministic when deterministic ops are enabled (via TF_DETERMINISTIC_OPS or tf.config.experimental.enable_op_determinism, in version 2.8 onwards) without having to enable the cuDNN front-end. In the legacy cuDNN API, cuDNN convolution algorithm selection should be deterministic when deterministic ops are enabled.

I think this is a bug.

Please will you confirm that this issue exists in the latest release (version 2.7; if possible) and open an issue (after attempting to confirm that one does not already exist for this). Provide a simple-as-possible, well-contained reproducer that demonstrates this issue on a specific version of TensorFlow. Please tag me, @reedwm, and @kaixih in that new issue, and reference this discussion.

Thank you for doing all this work, Leiwu.

Add multi-algorithm deterministic cuDNN convolutions

5341e3d

tensorflow-bot bot added the size:L CL Change Size: Large label Dec 9, 2019

googlebot added the cla: yes label Dec 9, 2019

gbaned self-assigned this Dec 9, 2019

gbaned added this to Assigned Reviewer in PR Queue via automation Dec 9, 2019

gbaned requested a review from timshen91 December 9, 2019 04:56

timshen91 requested changes Dec 9, 2019

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Dec 9, 2019

gbaned added the stat:awaiting response Status - Awaiting response from author label Dec 17, 2019

duncanriach mentioned this pull request Dec 20, 2019

Address problems with use_deterministic_cudnn test decorator #33900

Merged

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 21, 2019

gbaned requested a review from timshen91 December 24, 2019 09:20

gbaned added the awaiting review Pull request awaiting review label Dec 24, 2019

Refactor code that enables deterministic operation of cuDNN

330e7ad

tensorflowbutler removed the awaiting review Pull request awaiting review label Dec 28, 2019

gbaned added the awaiting review Pull request awaiting review label Jan 3, 2020

tensorflowbutler removed the awaiting review Pull request awaiting review label Jan 4, 2020

gbaned added the awaiting review Pull request awaiting review label Jan 8, 2020

Add comments about plan to migrate from environment variables to tf.c…

891416a

…onfig plus plumbing

tensorflowbutler removed the awaiting review Pull request awaiting review label Jan 16, 2020

timshen91 approved these changes Jan 17, 2020

View reviewed changes

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Jan 17, 2020

tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jan 17, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 17, 2020

tensorflow-copybara pushed a commit that referenced this pull request Jan 27, 2020

Merge pull request #34951 from duncanriach:multi-algorithm-determinis…

4f5d1cc

…tic-cudnn-convolutions PiperOrigin-RevId: 291684013 Change-Id: I818177de66eeec3dd52e276a5894a1d7a7166459

tensorflow-copybara merged commit 891416a into tensorflow:master Jan 27, 2020

PR Queue automation moved this from Approved by Reviewer to Merged Jan 27, 2020

duncanriach deleted the multi-algorithm-deterministic-cudnn-convolutions branch February 11, 2020 22:05

duncanriach restored the multi-algorithm-deterministic-cudnn-convolutions branch February 11, 2020 22:06

duncanriach mentioned this pull request Apr 13, 2020

List deterministic op functionality bug fixes in version 2.2 release notes #38509

Merged

duncanriach deleted the multi-algorithm-deterministic-cudnn-convolutions branch May 7, 2020 21:20

duncanriach mentioned this pull request Jan 14, 2022

Deterministic selection of deterministic cuDNN convolution algorithms removed in TF 2.5 #53771

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-algorithm deterministic cuDNN convolutions #34951

Add multi-algorithm deterministic cuDNN convolutions #34951

duncanriach commented Dec 9, 2019 •

edited

timshen91 Dec 9, 2019 •

edited

duncanriach Dec 18, 2019 •

edited

duncanriach Dec 27, 2019 •

edited

duncanriach Dec 30, 2019 •

edited

duncanriach Dec 31, 2019

timshen91 Jan 7, 2020 •

edited

duncanriach Jan 7, 2020 •

edited

timshen91 Jan 7, 2020

duncanriach Jan 7, 2020

duncanriach Jan 16, 2020 •

edited

gbaned commented Dec 17, 2019

duncanriach commented Dec 20, 2019

duncanriach commented Dec 27, 2019

duncanriach commented Dec 30, 2019 •

edited

akuegel commented Jan 27, 2020

duncanriach commented Jan 27, 2020 •

edited

sanjoy commented Jan 29, 2020

duncanriach commented Jan 30, 2020

akuegel commented Jan 30, 2020

duncanriach commented Jan 30, 2020

leiwuzheng commented Nov 20, 2021 •

edited

duncanriach commented Nov 22, 2021 •

edited

kaixih commented Nov 23, 2021

duncanriach commented Nov 24, 2021 •

edited

leiwuzheng commented Nov 24, 2021 •

edited

duncanriach commented Nov 25, 2021 •

edited

Add multi-algorithm deterministic cuDNN convolutions #34951

Add multi-algorithm deterministic cuDNN convolutions #34951

Conversation

duncanriach commented Dec 9, 2019 • edited

timshen91 Dec 9, 2019 • edited

Choose a reason for hiding this comment

duncanriach Dec 18, 2019 • edited

Choose a reason for hiding this comment

Adding and Modifying Switches

Ideal Solution

Compromise

duncanriach Dec 27, 2019 • edited

Choose a reason for hiding this comment

duncanriach Dec 30, 2019 • edited

Choose a reason for hiding this comment

duncanriach Dec 31, 2019

Choose a reason for hiding this comment

timshen91 Jan 7, 2020 • edited

Choose a reason for hiding this comment

duncanriach Jan 7, 2020 • edited

Choose a reason for hiding this comment

timshen91 Jan 7, 2020

Choose a reason for hiding this comment

duncanriach Jan 7, 2020

Choose a reason for hiding this comment

duncanriach Jan 16, 2020 • edited

Choose a reason for hiding this comment

gbaned commented Dec 17, 2019

duncanriach commented Dec 20, 2019

duncanriach commented Dec 27, 2019

duncanriach commented Dec 30, 2019 • edited

akuegel commented Jan 27, 2020

duncanriach commented Jan 27, 2020 • edited

sanjoy commented Jan 29, 2020

duncanriach commented Jan 30, 2020

akuegel commented Jan 30, 2020

duncanriach commented Jan 30, 2020

leiwuzheng commented Nov 20, 2021 • edited

duncanriach commented Nov 22, 2021 • edited

kaixih commented Nov 23, 2021

duncanriach commented Nov 24, 2021 • edited

leiwuzheng commented Nov 24, 2021 • edited

duncanriach commented Nov 25, 2021 • edited

duncanriach commented Dec 9, 2019 •

edited

timshen91 Dec 9, 2019 •

edited

duncanriach Dec 18, 2019 •

edited

duncanriach Dec 27, 2019 •

edited

duncanriach Dec 30, 2019 •

edited

timshen91 Jan 7, 2020 •

edited

duncanriach Jan 7, 2020 •

edited

duncanriach Jan 16, 2020 •

edited

duncanriach commented Dec 30, 2019 •

edited

duncanriach commented Jan 27, 2020 •

edited

leiwuzheng commented Nov 20, 2021 •

edited

duncanriach commented Nov 22, 2021 •

edited

duncanriach commented Nov 24, 2021 •

edited

leiwuzheng commented Nov 24, 2021 •

edited

duncanriach commented Nov 25, 2021 •

edited